Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500

نویسندگان

  • Christopher Krauß
  • Xuan Anh Do
  • Nicolas Huck
چکیده

In recent years, machine learning research has gained momentum: New developments in the field of deep learning allow for multiple levels of abstraction and are starting to supersede wellknown and powerful tree-based techniques mainly operating on the original feature space. All these methods can be applied to various fields, including finance. This article implements and analyses the effectiveness of deep neural networks (DNN), gradient-boosted-trees (GBT), random forests (RAF), and several ensembles of these methods in the context of statistical arbitrage. Each model is trained on lagged returns of all stocks in the S&P 500, after elimination of survivor bias. From 1992 to 2015, daily one-day-ahead trading signals are generated based on the probability forecast of a stock to outperform the general market. The highest k probabilities are converted into long and the lowest k probabilities into short positions, thus censoring the less certain middle part of the ranking. Empirical findings are promising. A simple, equal-weighted ensemble (ENS1) consisting of one deep neural network, one gradient-boosted tree, and one random forest produces out-of-sample returns exceeding 0.45 percent per day for k = 10, prior to transaction costs. Irrespective of the fact that profits are declining in recent years, our findings pose a severe challenge to the semi-strong form of market efficiency.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Empirical Comparison of Supervised Learning Algorithms Using Different Performance Metrics

We present results from a large-scale empirical comparison between ten learning methods: SVMs, neural nets, logistic regression, naive bayes, memory-based learning, random forests, decision trees, bagged trees, boosted trees, and boosted stumps. We evaluate the methods on binary classification problems using nine performance criteria: accuracy, squared error, cross-entropy, ROC Area, F-score, p...

متن کامل

Event Nugget Detection, Classification and Coreference Resolution using Deep Neural Networks and Gradient Boosted Decision Trees

For the shared task of event nugget detection at TAC 2015 we trained a deep feed forward network achieving an official F1-score of 65.31% for plain annotations, 55.56% for event mention type and 49.16% for the realis value. For the task of Event Coreference Resolution we prototyped a simple baseline using Gradient Boosted Decision Trees achieving an overall average CoNLL score of 70.02%. Our co...

متن کامل

Convex Variational Formulations for Learning Problems

In this article, we introduce new techniques to solve the nonlinear regression problem and the nonlinear classification problem. Our benchmarks suggest that our method for regression is significantly more effective when compared to classical methods and our method for classification is competitive. Our list of classical methods includes least squares, random forests, decision trees, boosted tre...

متن کامل

Interpreting Tree Ensembles with inTrees

Tree ensembles such as random forests and boosted trees are accurate but difficult to understand, debug and deploy. In this work, we provide the inTrees (interpretable trees) framework that extracts, measures, prunes and selects rules from a tree ensemble, and calculates frequent variable interactions. An rule-based learner, referred to as the simplified tree ensemble learner (STEL), can also b...

متن کامل

Boosted trees for ecological modeling and prediction.

Accurate prediction and explanation are fundamental objectives of statistical analysis, yet they seldom coincide. Boosted trees are a statistical learning method that attains both of these objectives for regression and classification analyses. They can deal with many types of response variables (numeric, categorical, and censored), loss functions (Gaussian, binomial, Poisson, and robust), and p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • European Journal of Operational Research

دوره 259  شماره 

صفحات  -

تاریخ انتشار 2017